Shedding light on the dark genome: Insights into the genetic, CRISPR-based, and pharmacological dependencies of human cancers and disease aggressiveness

Investigating the human genome is vital for identifying risk factors and devising effective therapies to combat genetic disorders and cancer. Despite the extensive knowledge of the "light genome”, the poorly understood "dark genome" remains understudied. In this study, we integrated data from 20,412 protein-coding genes in Pharos and 8,395 patient-derived tumours from The Cancer Genome Atlas (TCGA) to examine the genetic and pharmacological dependencies in human cancers and their treatment implications. We discovered that dark genes exhibited high mutation rates in certain cancers, similar to light genes. By combining the drug response profiles of cancer cells with cell fitness post-CRISPR-mediated gene knockout, we identified the crucial vulnerabilities associated with both dark and light genes. Our analysis also revealed that tumours harbouring dark gene mutations displayed worse overall and disease-free survival rates than those without such mutations. Furthermore, dark gene expression levels significantly influenced patient survival outcomes. Our findings demonstrated a similar distribution of genetic and pharmacological dependencies across the light and dark genomes, suggesting that targeting the dark genome holds promise for cancer treatment. This study underscores the need for ongoing research on the dark genome to better comprehend the underlying mechanisms of cancer and develop more effective therapies.

The "dark genome", "ignorome", or "Tdark" comprises protein-coding genes with limited or no known function in literature, representing over a third of all genes [28].This vast amount of unexplored genetic information highlights the significant gaps in our understanding of their roles and importance [28][29][30].The lack of knowledge surrounding these genes poses a considerable obstacle to the advancement of personalised medicine, despite the ever-growing comprehension of the human genome [29,31].While our understanding of the human genome has undeniably facilitated the diagnosis of genetic diseases and the development of targeted therapies for various pathological disorders, such as cancer [5], the dark genome represents a substantial challenge to the progress of personalised medicine [28].The inability to decipher the functions and significance of these genes limits our ability to tailor medical treatments based on individual genetic variations, thereby impeding the full realisation of personalised medicine's potential [28].Overcoming this obstacle requires extensive research aimed at unravelling the mysteries of the dark genome, shedding light on its function, and unlocking its therapeutic potential.
Large molecular profiling projects, such as The Cancer Genome Atlas (TCGA) project [32], have extensively profiled human cancers, providing valuable insights into their molecular landscapes and potential therapeutic targets.The Achilles project utilises CRISPR technology to explore gene dependencies in cancer cells [33], whereas the Genomics of Drug Sensitivity in Cancer (GDSC) [34] and Cancer Cell Line Encyclopaedia (CCLE) [35] projects screen thousands of cancer cell lines for small-molecule inhibitor responses.These efforts have generated vast publicly accessible data for advancing cancer understanding and treatment and uncovering the mysteries of the dark genome.The Illuminating the Druggable Genome (IDG) project has compiled dark gene data from over 60 sources to identify new therapeutic targets and advance cancer understanding [30,36].This effort led to the development of the Target Central Resource Database (TCRD), which is a comprehensive database that integrates various data types.To facilitate easy exploration and sharing of this data, a web-based platform called Pharos was created [37].
Integrating the different datasets from TCGA, Achilles, GDSC, CCLE, and IDG projects is essential for illuminating the dark genome and overcoming obstacles posed by unstudied human genes, thus facilitating the advancement of personalised medicine.In this study, we extensively analysed the distribution of genes in both light and dark genomes, investigated the mutation frequency of dark genes across various human cancer types, and assessed their impact on the chemosensitivity of cancer cell lines and aggressiveness of cancer.Our comprehensive analysis aimed to provide valuable insights into the role of the dark genome in human cancers and contribute to a better understanding of the genetic landscape of these diseases.Furthermore, by leveraging information from large-scale molecular profiling projects, we underscore the importance of continued research on the dark genome and the integration of multiple datasets to enhance our understanding of its impact on cancer, ultimately promoting more effective, targeted therapies for cancer and other genetic disorders.

Methods
The study protocol received approval from the University of Zambia; Health Sciences Research Ethics Committee (IRB00011000).The analyses in this study utilised publicly available datasets and de-identified clinical information collected by the TCGA, CCLE, Achilles, IDG, and GDSC projects.These datasets were made accessible through their respective project databases.The methods employed in this study adhered to the relevant policies, regulations, and guidelines established by the TCGA, CCLE, DepMap, IDG, and GDSC projects for the analysis of their datasets and the reporting of findings.
We analysed a dataset comprising 10,528 patient-derived tumours representing 32 distinct human cancers, obtained from cBioPortal [32] version 3.1.9(http://www.cbioportal.org).The acquired data included somatic gene mutations (point mutations and small insertions/deletions), mRNA expression, and comprehensive de-identified clinical data.We further filtered the dataset to include only cancer studies with clinical information for profiled patients.The final datasets encompassed 8,395 patient-derived tumour samples representing 28 distinct human cancer types (see S1 File for cancer study details).

Distribution and research focus on light and dark genes across the target development levels (TDLs)
We obtained human gene classifications based on target development levels (TDLs) from the Pharos [30] interface (version 3.15.1)(https://pharos.nih.gov/).The IDG project classified proteins into four TDLs based on the level of clinical, biological, and chemical investigations conducted on each protein.These TDLs include light genes (Tbio, Tclin, and Tchem) and dark genes (Tdark) [38]: • Tclin proteins are drug targets associated with at least one approved drug, and their mechanism of action (MoA) is known [30,36].
• Tchem proteins do not have established connections to approved drugs based on MoA.However, they are recognised for their exceptional ability to bind with high potency to small molecules, surpassing the bioactivity cutoff values: � 30 nM for kinases, � 100 nM for GPCRs and nuclear receptors, � 10 μM for ion channels, and � 1 μM for other target families [30,38].
• Tbio are proteins with well-studied biology and meet certain criteria, such as having a fractional publication count above 5, being annotated with a Gene Ontology Molecular Function or Biological Process with an Experimental Evidence code and having confirmed OMIM phenotype(s) [30,36].
• Tdark are understudied proteins that do not meet the criteria for the above three categories [36].They meet at least two of the following conditions: A PubMed text-mining score < 5, � 3 Gene RIFs or � 50 Antibodies available per Antibodypedia [38].
We analysed a dataset of 20,412 targets: light genes (Tbio [n = 12,058], Tclin [n = 704], Tchem [n = 1,971]) and dark genes (Tdark [n = 5,679]).Additionally, we compared gene distribution, and TDLs between TCRD versions 4.3.4 and 6.13.4 to investigate gene classification changes over time.We also gathered information on antibody counts, monoclonal antibody counts, and PubMed publication counts from Pharos for further insight into the research focus within each TDL.

Determination of the extent to which dark genes are mutated in human cancer types
To assess the extent of dark gene mutations in cancer compared to light genes, we integrated TDLannotated protein-coding gene information from the Pharos database with the TCGA project [32] dataset of 8,395 patient-derived tumours representing 28 distinct human cancers.We determined the overall mutation frequencies in genes of each developmental level (Tdark, Tclin, Tchem, and Tbio) across the tumours of each cancer type and all cancer types.These analyses enabled us to understand how dark genes are altered within and across the 28 most common human cancer types.

Assessment of the impact of Tdark genes on cancer cell lines
To further evaluate the potential role of dark genes in cancer, we mined datasets from the Achilles project at the DepMap Portal version 21Q1 on the fitness of over 80 cell lines derived from 35 different human cancer types following CRISPR knockouts of 18,333 individual genes.See https://depmap.org/portal/for information on Achilles' CRISPR-derived gene dependency descriptions.Briefly, a lower score indicates a higher likelihood of cell line dependency on a given gene.A score of 0 corresponds to a non-essential gene, while -1 corresponds to the median of all "common essential genes".The database groups genes into four primary categories based on cell line fitness after CRISPR-mediated gene knockouts: • Common essential genes-consistently ranked in the top X most depleted genes in at least 90% of cell lines.
• Essential genes-associated with cell fitness in only one or a few cell lines, but with a dependency skewed-LRT value < 100.
• Non-essential genes-show no effect on the cell fitness in any of the 688 tested cell lines.
We incorporated TDL information from Pharos and CRISPR data to examine the impact of Tdark genes on cancer cell lines.Our assessment involved comparing the mean gene dependency scores obtained from CRISPR data to determine the importance of genes within each developmental stage for cancer cell lines across all cancer types and within each cancer type.
Additionally, we calculated and compared the number of common essential and non-essential genes across the Pharos development levels (S1 File) to understand how gene dependence relates to TDL classes.We then compared the number of PubMed publications, monoclonal antibodies, and polyclonal antibodies available for each group of genes (common essential versus non-essential genes).

Impact of common essential genes on cell fitness
To further investigate gene essentiality, we collected mRNA transcription data for 756 cancer cell lines from the Cancer Cell Line Encyclopaedia (CCLE) database [35].Using CRISPR-derived dependence scores, we identified essential genes across cell lines by counting the number of instances in which each gene's score was less than -0.5.This threshold was chosen based on the recommendation of the Achilles project, indicating reduced cell fitness after CRISPR-mediated gene knockouts.We then tallied instances where the CRISPR-derived dependence score was less than -0.5 for each gene in each cell line to identify genes associated with a reduction in cell fitness across all cell lines.Genes with a significant number of instances were identified as essential, as they were critical for cell survival and growth.Finally, we compared Achilles CRISPR-based fitness scores with transcription profiles.

Evaluation of the extent to which the dark genome affects the chemosensitivity of cancer cells in relation to the light genome
We analysed drug-response data from the Genomics of Drug Sensitivity in Cancer (GDSC) database [34] (www.cancerRxgene.org), to investigate the influence of the dark genome on cancer cell chemosensitivity compared to the light genome.The GDSC database provides comprehensive information on human cancer cell lines treated with a wide range of anticancer drugs that target various signalling pathways.We retrieved the dose-responses for 380 cancer cell lines to 397 drugs that target components of 24 different pathways.Additionally, we used Pharos to obtain information on the TDL annotation of light and dark genes.
We conducted a series of analyses for each pathway gene (e.g., MFSD1) in the context of the CRISPR dependency scores and mutations in cancer cell lines.Firstly, we divided the cell lines into two groups: (1) those with a high CRISPR dependency score on the gene (e.g., MFSD1 dependence score < -0.5), and (2) those with low dependence on that gene (e.g., MFSD1 dependence score > 0.5).Subsequently, we used the Wilcoxon rank-sum test to compare the IC50 values of 397 pathway inhibitors between these two groups of cell lines.Additionally, for each pathway gene, we further categorised the cell lines into two groups: (1) those with mutations in the specific gene (e.g., cell lines with MFSD1 mutations), and (2) those without mutations in that gene (e.g., cell lines with no MFSD1 mutations).Next, we compared the logarithm-transformed IC50 values of each anticancer drug between these two groups of cell lines using the Wilcoxon rank-sum test.
By conducting these analyses, we aimed to determine the extent to which dark and light genes affect cancer chemosensitivity and to identify the types of drugs and targeted pathways that are more relevant for tumours with alterations in specific dark and light genes.

Assessing the impact of dark genome alterations on cancer aggressiveness
To evaluate the influence of dark genome alterations on cancer aggressiveness, we integrated genetic alteration data, including mRNA transcription changes and gene mutations of 28 human cancer types, with clinical outcomes of patients in the TCGA.First, for each cancer type, we segregated patients' tumours into two groups: 1) those without mutations and 2) mutations across all dark genes, or 1) those expressing higher mRNA transcripts of a particular dark gene and 2) those expressing lower mRNA transcripts of a particular dark gene.We then applied the Kaplan-Meier method and Log-rank test to compare the duration of overall survival (OS) and disease-free survival (DFS) between the two groups of tumours across all cancer types.
By comparing the OS and DFS durations, we sought to determine whether alterations in dark genes influence cancer aggressiveness and, if so, the extent of their impact.Furthermore, we investigated the association between mRNA expression levels of specific dark genes and the aggressiveness of disease in each cancer types and across all the cancer types.This assessment allowed us to identify potential dark gene targets that could have clinical significance in the prognosis and treatment of cancer.

Statistical analyses
All statistical analyses were performed using MATLAB R2021a software.Where appropriate, we used the independent sample Student t-test, Welch test, the Wilcoxon rank-sum test and the one-way Analysis of Variance to compare groups of continuous variables.Statistical tests were considered significant at p < 0.05 for single comparisons, whereas the p-values of multiple comparisons were adjusted using the Bonferroni correction method.

Distribution and potential of dark genes as therapeutic targets
We obtained human gene classification information based on target development levels (TDLs) from Pharos [version 3.15.1](https://pharos.nih.gov), a multimodal web interface that presents data from the Target Central Resource Database (TCRD) [30,37].Pharos classified genes/proteins into four TDLs: light genes (Tbio [n = 12,058], Tclin [n = 704], and Tchem [n = 1,971]), and dark genes (Tdark [n = 5,679]) (see S1A Fig and "Methods" section for the description of TDLs).Our evaluation of TDL genes revealed that only 3.5% are currently utilised as drug targets, suggesting that many genes have the potential to be developed as drug targets (Tchem), and that a significant proportion (27.8%) of the dark genome encoding Tdark proteins remains to be understood.This study further filtered the dataset to include genes with information on publications, monoclonal antibodies, and antibody counts.We observed a reduction in dark genes between TCRD versions 4.3.4 and 6.13.4,with version 4.3.4containing 7,003 Tdark genes and version 6.13.4 with 5,679 Tdark genes, indicating a decrease of 1,324 genes (see S1B Fig) .Most genes, including ACTR8, CSMD3, LSM3, and AASDH, which were previously classified as Tdark, are now categorised as Tbio, likely because of new information regarding their functions.This reduction in the number of dark genes between the two database versions reflects the progress in our understanding of the human genome and its potential as a source of novel therapeutic targets.Furthermore, this reveals that what is called "dark genes" currently may have a function after careful analysis and further studies. .These results suggest significant impacts of target development levels on publication, antibody, and monoclonal antibody counts.We further analysed the differences between target development levels using Bonferroni post hoc comparisons, revealing that Tclin had the highest publication count (mean = 4.53) and Tdark the lowest (mean = 1.84).Similarly, Tclin had the highest antibody count (mean = 5.40), and Tdark had the lowest (mean = 2.85).Tclin also had the highest monoclonal antibody count (mean = 3.21), while Tdark had the lowest (mean = 0.39).Our findings suggest that the research focus is consistently high in Tclin, followed by Tchem, Tbio, and Tdark, offering valuable insights into research priorities and trends within different development levels and informing future research efforts.To identify potential differences between essential and non-essential genes in research and development efforts, we assessed the mean publication count, antibody count, and monoclonal antibody count between the common essential (N = 2073) and non-essential (N = 722) genes in the Achilles project dataset.We found significantly higher mean publication counts for common essential genes (mean = 3.25) compared to non-essential genes (mean = 2.56) (Welch test: t = 14.66, p = 1.28 X 10 −44 ) (Fig 2D).Additionally, both antibody and monoclonal antibody counts were significantly higher in common essential genes (mean antibody count = 4.47, mean monoclonal antibody count = 1.91) than in non-essential genes (mean antibody count = 4.02, mean monoclonal antibody count = 1.22), antibody count (t = 8.86, p = 2.97 × 10 −18 ), and monoclonal antibody count (t = 14.66, p = 2.27 × 10 −23 ) (Fig 2E and 2F), suggesting that essential genes receive more research attention.

Dark genes are as frequently mutated across human cancers as light genes
To investigate the extent of mutations in dark and light genes in cancer, we obtained a dataset of 8,395 human cancer cases across 28 primary tumours from TCGA [32], consisting of gene copy number alterations and somatic mutations.We integrated this information with the TDL classification of genes from Pharos and assessed the number of cancers with mutations in each TDL class.Our results revealed that most cancers harboured Tbio gene mutations (99.88%), followed by Tdark (97.71%),Tchem (96.68%), and Tclin (92.94%) (Fig 1B and S1 File).

Dark genes strongly impact the fitness of cancer cells
To assess the importance of dark genes compared with light genes, we used Achilles [33] data to analyse the impact of genes in each target development level (Tclin, Tchem, Tbio, and Tdark) on cancer cell lines.In the Achilles project, gene dependency scores measure the effect of gene perturbation on cell fitness, where a score close to -1 indicates reduced fitness (increased dependency), a score close to 1 indicates increased fitness (reduced dependency), and a score of 0 indicates no change in fitness (independence).
A one-way ANOVA test was performed to compare the mean gene dependency scores among the target development levels, revealing a statistically significant difference, F(3, 14120423) = 5.29 x 10 4 , P < 1 x 10 −300 .The mean gene dependency scores for Tclin, Tchem, Tbio, and Tdark were -0.10, -0.14, -0.17, and -0.07, respectively.Tbio had the lowest mean gene dependency score, signifying the greatest negative impact on cell viability, followed by Tchem, Tclin, and Tdark, with the least negative impact.However, the effect sizes suggest that this difference accounted for a relatively small proportion of the overall variance.Specifically, the eta-squared (η 2 ) value was 0.0111, indicating that 1.11% of the total variance in our outcome variable can be attributed to group membership.The omega-squared (ω 2 ) value was 0.0111, suggesting a similar proportion of explained variance when adjusted for bias.Given the very large sample size of 14,148,860 instances, these small effect sizes suggest that the practical significance of the differences between groups may be limited, despite the statistical significance indicated by the p-value (Fig 3).

Correlation between gene essentiality and mRNA expression of dark genes
The gene essentiality signature of cell lines has been reported to be related to their mRNA transcription signature [48].Therefore, we sought to assess the impact of the common essential dark and light genes frequently mutated in cancer on cell fitness.To this end, we examined the correlation between mRNA expression and dependency scores.Among the top genes with the most significant correlation, we found that, among dark genes, the brain cancer cell lines showed significantly greater dependency on the dark gene NRDE2 than other cell lines (p < 1 x 10 −300 ); Fig 4A).Furthermore, cell lines with NRDE2 mutations exhibited a greater dependency on NRDE2 expression than other cell lines (p < 1 × 10 −300 ).Furthermore, pancreatic cancer cell lines showed significantly greater dependency on the dark gene WDR7 than other cell lines (p < 1 x 10 −300 ); Fig 4B).Additionally, cell lines with WDR7 mutations exhibited a slightly reduced dependency on WDR7 expression than other cell lines (p < 1 × 10 −300 ).This pattern of dependence was similar to that observed for many light genes, including CRNKL1 in breast cancer cell lines (Fig 4C ) and MED14 in leukaemia cell lines (Fig 4D).These findings suggest that both dark and light genes play significant roles in cell fitness and may contribute to the development or progression of specific cancer types (S2 File).Consequently, these results provide valuable insights into the genetic mechanisms involved in these cancers and suggest that these genes could serve as potential targets for the development of novel therapies.Therefore, the CRISPR-based gene editing technology used in the Achilles project has the potential to provide valuable insights for developing new cancer treatments and ultimately improving outcomes for cancer patients [33,[49][50][51][52][53][54][55][56][57].
We further hypothesised that common essential genes are likely to be highly expressed in cancer cell lines.Therefore, we compared the mean transcript levels between the common essential genes and other genes and found that the common essential genes are indeed significantly more highly expressed (Welch t-test; t = 709.7;p < 1 × 10 −300 ); see S2 Fig) in each target development level.Overall, these findings suggest that the "common essential" genes may be a potential target for cancer treatment and further research is warranted to explore this possibility.

Dark and light genes similarly impact the chemosensitivity of cancer cells
The sensitivity of cancer cell lines to pathway inhibitors is influenced by various factors, such as genetic mutations [58,59], the targeted pathway [60][61][62][63], the specific inhibitor used [64,65], and the level of dependence on targeted pathway components [48].Therefore, we investigated whether CRISPR-derived measures of cellular dependence on pathway components correlate with the response of cell lines to existing drug molecules that inhibit these components, which could inform the identification of optimal drug targets within the pathways (see "Methods" section).This study classified cancer cell lines from the GDSC database into two categories based on their CRISPR-derived dependency on genes: one group with a higher CRISPR-derived dependency and the other with a lower dependency.We then compared the mean dose responses of 397 pathway inhibitors (S3A Fig, also see S3 File for the list of inhibitors) between these two groups of cancer cell lines, and only significant results were obtained (p-value < 0.05).We found that 164,225 cases met this criterion, indicating a significant association between pathway inhibitors and the response of cancer cell lines.Notably, these instances encompassed both light genes (Tbio [n = 114,660], and Tchem [n = 21,066]), Tclin [n = 7,173] and dark genes (Tdark [n = 21,326]) (see S3C Fig and S3 File).Among the cell lines dependent on dark genes, we found that the top three inhibitors with the most significant efficacy differences  S3 File).Among the cell lines with mutated dark genes, we found that the top three inhibitors with the most significant efficacy differences between Using a chi-squared test of independence, we examined the potential relationship between the target development level of genes and drug sensitivity in cancer cell lines, obtaining a pvalue of 0.2133.The results indicated a lack of statistical significance in the observed association, suggesting that mutations in no specific gene class had a biased effect on the drug sensitivity of cancer cells.
We further analysed the response of the cell lines to inhibitors targeting 24 different signalling pathways (S3 File).Interestingly, we observed that some cell lines with CRISPR-derived dependency scores for specific pathway genes were sensitive to inhibitors of more than one pathway.For example, cell lines dependent on MEF2C, RUNX1, and ALAD were sensitive to 80%, 44%, and 44% respectively, of the PI3K/mTOR signalling inhibitors.Similarly, these cell lines also displayed sensitivities to 72%, 76%, and 40%, respectively, of the RTK pathway inhibitors profiled by GDSC (also see S3 File).Furthermore, we found that cell lines dependent on TUBB4B and KLF5 showed resistance to PI3K/mTOR signalling inhibitors (44% and 40% respectively) as well as to RTK pathway inhibitors (64% and 68% respectively), as profiled by the GDSC (see S3 File).Additionally, we observed that dark gene-dependent cell lines exhibited notable sensitivity to pathway inhibitors, particularly in the presence of mutations.For example, in cell lines with mutations in the PI3K/mTOR signalling pathway, we found that 14 of the top 50 genes were dark genes, including NPVF, FAM122C, and TMEM144, which were associated with mixed responses (i.e., significantly increased sensitivity to some of the inhibitors and significantly decreased sensitivity to others) (Fig 7B).Whereas cell lines with mutations in the RTK pathway, 10 were dark genes, including ANKRD39, OR7D2, and CCDC43, (7 associated with mixed response, 2 associated with resistance and 1 associated with increased sensitivity) (see S7B Fig).Clustered heatmap; The marks on the heatmap are coloured based on how a high dependence on, or mutations in, the gene along each column affect the efficacy of the drug given along each row: (1) with green denoting significantly (10% false discovery rate) increased sensitivity, (2) grey for no statistically significant difference between cell line with a higher and lower dependence on the gene, or cell line with mutation in a gene and (3) orange denoting significantly increased resistance (for gene dependence and for gene mutations).The gene names (column labels) are coloured based on the overall calculated effect that high dependence on the gene has on the efficacy of the drug given along rows.Green: all the cell lines are significantly more sensitive to all the pathway inhibitors, orange; all the cell lines are significantly more resistant to all the pathway inhibitors, and black; a mixed response to pathway inhibitors.The bar graphs represent the total number of drugs whose dose-response is significantly increased (green) or decreased (orange).https://doi.org/10.1371/journal.pone.0296029.g007 Our findings highlight that CRISPR-derived estimates of the dependency on signalling pathway components can predict the responsiveness of different primary tumour types to pathway inhibitors.This information could be used to develop targeted therapies for cancers with dark gene mutations or a great dependence on dark genes, which may be more sensitive to pathway inhibitors and respond better to treatment.

Dark genes and their impact on cancer patients' survival
We aimed to evaluate the aggressiveness of tumours in patients with genetic alterations in dark genes.We conducted an analysis of disease-free survival (DFS) and overall survival (OS) in cancer patients, considering the mutation status and mRNA expression levels of dark genes to ascertain if specific patient groups exhibited distinct clinical outcomes.We obtained and analysed a pan-cancer dataset from TCGA, which included mRNA expression levels, mutations, and completely de-identified clinical information (refer to the Methods section).
We categorised patients' tumours into two groups based on their mutation status to assess their impact on OS and DFS.The first group, "tumours with Tdark mutation", comprised tumours with mutations in dark genes (3,887 samples), while the second group, "tumours without Tdark mutation," consisted of tumours lacking mutations in dark genes (5,463 samples).We investigated whether the two groups were associated with different clinical outcomes.Using the Kaplan-Meier method [66], we observed that patients with tumours harbouring dark gene mutations had significantly shorter OS periods (OS = 93.83months) than those with tumours lacking dark gene mutations (OS = 126.18months; Fig 8A) (log-rank test; p = 0.002).We also found that DFS periods were significantly shorter (log-rank test; p = 0.008) in patients with tumours containing dark gene mutations than in those without dark gene mutations (Fig 8B).
To analyse OS and DFS based on mRNA expression levels, we employed the z-score normalisation method to segregate patients' tumours into two groups: those with higher expression levels of a specific dark gene and those with lower expression levels of a specific dark gene (refer to the Methods section).These groups were defined as high or low expression according to the mRNA transcript levels of a particular dark gene.Using the Kaplan-Meier method, we found that, of the 3,347 dark genes whose expression varied across 8,395 patient tumours analysed, OS periods were significantly shorter for 978 (29.22%) dark genes in patients with tumours expressing high-mRNA transcripts of a specific dark gene.In comparison, OS periods were significantly shorter for 1,258 (37.59%) dark genes in patients with tumours expressing low-mRNA transcripts of a specific dark gene (see S4 File).DFS analysis revealed that periods were significantly shorter for 755 (22.56%) dark genes in patients with tumours expressing high-mRNA transcripts of a specific dark gene, whereas DFS periods were significantly shorter for 1,011 (30.21%) dark genes in patients with tumours expressing low-mRNA transcripts of a specific dark gene (see S4 File).
Among the dark genes, we found that patients with tumours with high expression of ARL6IP6 exhibited the shortest significant OS duration (OS = 57.07months) compared to those with low gene expression (OS = 127.33months; Fig 8C).We observed that DFS periods were significantly shorter (log-rank test; p = 9.44 x 10 −13 ) for patients with tumours expressing high ARL6IP6 levels than those with low ARL6IP6 expression (Fig 8D Moreover, we investigated the association between the mRNA expression levels of specific dark genes and the aggressiveness of disease in each cancer type and across all the cancer types.We found that, of the top 50 dark genes that exhibited a significant association with reduced OS in each of the 28 cancer types (p-value < 0.05), many genes demonstrated an impact on OS across multiple cancer types.For instance, ABCA11P is associated with reduced OS in both oesophageal carcinoma and glioblastoma multiforme.At the same time, GAGE1 showed a similar effect in five cancer types, including kidney chromophobe, kidney renal papillary cell carcinoma, liver hepatocellular carcinoma, lung squamous cell carcinoma, and prostate adenocarcinoma (see S9 In summary, our survival analyses demonstrated that patients with tumours harbouring mutations in the dark genes had significantly shorter overall survival (OS) and disease-free survival (DFS) than those without such mutations.Similarly, a considerable proportion of dark genes showed that patients with tumours expressing high mRNA transcripts had significantly shorter OS and DFS periods than those with low mRNA transcripts.These findings indicate that both the mutation status and mRNA expression levels of dark genes may be useful prognostic markers for cancer patients.Moreover, our study unveiled dark genes whose mRNA expression levels are associated with the aggressiveness of the disease, both in particular cancer types and across multiple cancer types.The identification of these genes provides valuable insights into potential targets for therapeutic interventions and highlights the interconnectedness of specific genes across different cancer types.

Discussion
In this study, we investigated the potential roles of Tdark genes in cellular processes, cancer development, and progression, and their possible use as drug targets.We observed that Tdark genes constituted a substantial proportion (27.9%) of the genome, highlighting the need for further research to elucidate their function.Our findings also revealed that only a small percentage (3.4%) of Tclin genes are currently utilised as drug targets, indicating a significant potential for developing new drugs targeting relatively well-studied Tchem genes [36,37].
We identified an increase in the number of publications, antibody counts, and monoclonal antibody counts for essential genes compared with non-essential genes, suggesting a focus on essential genes in research [67][68][69][70][71][72].However, we also found that Tdark proteins had the least amount of associated data, indicating a knowledge deficit [28,30,31,37].Furthermore, our study revealed that Tdark genes are mutated as frequently as light genes in human cancers, with Tdark genes being mutated in 98.96% of all cancers analysed, suggesting significant roles in cancer development and progression.
Our investigation identified PKHD1L1 as the most frequently mutated Tdark gene across 28 types of cancer, indicating its potential as a broad therapeutic target.Notably, previous research [73][74][75][76][77][78] has also indicated the substantial involvement of PKHD1L1 in cancer, underscoring its probable significance in the development and progression of this disease.We also reported differences in the mean gene dependency scores among Tclin, Tchem, Tbio, and Tdark genes, suggesting that Tdark genes are also essential for the survival of the cancer cell lines and may be promising therapeutic targets [30,33,79].Furthermore, our study identified critical genes that may be involved in the development and progression of brain cancer, pancreatic cancer, breast cancer, and leukaemia.Therefore, this study sheds light on the genetic mechanisms underlying these cancers, indicating that the identified genes hold promise as potential therapeutic targets.
We found that cancer cell lines with high dependency on pathway genes showed better responses to pathway inhibitors than those with lower dependency.Additionally, cancer cell lines with mutations in specific pathway genes were more responsive to pathway inhibitors than those without mutations, in agreement with a previous study [48].These findings have important implications for the development of targeted therapies and personalised medicine approaches for cancer treatment.Moreover, our findings confirm previous reports that nutlin-3a exhibits sensitivity in cancer cell lines that are dependent on MDM2, while demonstrating resistance in cancer cell lines with TP53 mutations [80,81].These results underscore the significance of considering the molecular characteristics of cancer cells, including MDM2 dependence and TP53 mutation status, when assessing the potential effectiveness of nutlin-3a as a therapeutic intervention.
Furthermore, our study revealed a significant association between mutations in Tdark genes and poorer clinical outcomes in terms of overall survival (OS) and disease-free survival (DFS) in cancer patients.In addition, patients with tumours harbouring Tdark gene mutations had significantly shorter OS and DFS periods than those without such mutations, suggesting that mutations in Tdark genes may be important predictors of clinical outcomes.Moreover, our analysis revealed a strong correlation between the expression of Tdark genes and OS and DFS.Notably, we observed that high expression of ARL6IP6 and low expression of RAI2 were particularly associated with the shortest OS and DFS periods.It is worth noting that previous studies have shown that decreased RAI2 expression is linked to poor prognosis in colorectal cancer [82,83] and breast cancer [84][85][86].These findings underscore the potential significance of Tdark as a valuable prognostic marker in cancer.
In conclusion, our findings underscore the importance of incorporating genetic information into cancer treatment and highlight the potential of personalised medicine approaches.Furthermore, the results demonstrated that Tdark genes are important players in cancer development, warranting further research into their biological functions and potential as targets for cancer therapy.(XLSX) S3 File.Dose-response of cancer cell lines.The spreadsheet contains the following results/ datasets according to the sheet name.Between Cell line Dose Responses: mean difference comparison of the dose-responses to pathway inhibitors between the cancer cell lines that have a higher dependence on dark and light genes and those with a lower dependence on dark and light genes as defined using the CRISPR-derived gene dependence scores (see methods section).Mutations Gene Drug Response: mean dose-response comparison between cell lines that have a mutation(s) in a particular gene versus those that do not have a mutation(s) in that particular gene.Pathways: List of pathways from which our analyses are based.Pathway Inhibitors: List of pathway inhibitors from which our analyses are based.The rest of the sheets contain drug sensitive and resistant genes for all 24 pathways (e.g., PI3K MTOR), PI3K MTOR CRISPR: genes that are associated with significantly increased sensitivity to different pathway inhibitors in cell lines demonstrating higher dependence on the specific gene(s) for their fitness.And genes that are associated with a significant resistance to different pathway inhibitors in cell lines demonstrating higher dependence on the specific gene(s) for their fitness.PI3K MTOR Mutation: genes that are associated with significantly increased sensitivity to different pathway inhibitors in cell lines that have mutations in the specific gene.And genes that are associated with significant resistance to different pathway inhibitors in cell lines that have mutations in the specific gene.(XLSX) The final datasets encompassed 19,387 genes, including light genes (Tbio [n = 11,724], Tclin [n = 689], and Tchem [n = 1,946]) and dark genes (Tdark [n = 5,028]).The distribution of genes within each development level is illustrated in Fig 1A (see S1 File).

Fig 2 .
Fig 2. Research distribution across target development levels.Comparison of the a. publication count, b. antibody count, and c.Monoclonal antibody count among the four target development levels.Comparison of the d.publication count, e. antibody count and f.Monoclonal antibody counts between common essential (n = 2073) and non-essential (n = 722) genes.The boxplots indicate the distribution of the publication, antibody, and monoclonal antibody counts.On each box, the central mark indicates the median, and the bottom edge represents the 25 th percentile, whereas the top edge of the box represents the 75 th percentile.The whiskers extend to the most extreme data points that are not considered outliers.The scatter points within each box plot show the overall distribution of data points.https://doi.org/10.1371/journal.pone.0296029.g002 (50) Tclin genes, and 0.20 (200) Tdark genes were highly mutated (Fig 1C, also see S1C Fig).Our findings suggest numerous potential therapeutic targets for treating these cancer types, particularly Tbio, Tchem and Tdark genes.

Fig 3 .
Fig 3. Comparison of gene dependency scores among target development levels (TDLs) in cancer cell lines from the Achilles project.Boxplots show the gene dependency scores corresponding to the Tbio, Tchem, Tclin, and Tdark TDLs.On each box, the central mark indicates the median, and the bottom edge represents the 25 th percentile, whereas the top edge of the box represents the 75 th percentile.The whiskers extend to the most extreme data points that are not considered outliers.The scatter points within each box plot show the overall distribution of the data points.https://doi.org/10.1371/journal.pone.0296029.g003

Fig 4 .
Fig 4. Relationship between gene essentiality and mRNA expression. A. From left to right: correlation between NRDE2 transcript levels and NRDE2 gene dependence scores, the mean difference in NRDE2 dependence between brain cancer cell lines and all other cancer cell lines, and the mean difference in NRDE2 dependence score between NRDE2 mutant cell lines and cell lines that do not harbour NRDE2 mutations.B. (Left) Correlation between WDR7 transcript levels and WDR7 dependence scores, and the mean difference in WDR7 dependence scores between pancreatic cancer cell lines and all other cancer cell lines.(Right) The mean difference in the WDR7 dependence score between WDR7 mutant cell lines and cell lines that did not harbour WDR7 mutations.C. (Left) correlation between CRNKL1 transcript levels and CRNKL1 dependence scores, and the mean difference in the CRNKL1 dependence score between breast cancer cell lines and all other cancer cell lines.(Right) The mean difference in the CRNKL1 dependence score between CRNKL1 mutant cell lines and cell lines that did not harbour CRNKL1 mutations.D (Left) correlation between MED14 transcript levels and MED14 dependence scores, and the mean difference in the MED14 dependence score between leukaemia cell lines and all other cancer cell lines.(Right) The mean difference in the MED14 dependence score between MED14 mutant cell lines and cell lines that do not harbour MED14 mutations.https://doi.org/10.1371/journal.pone.0296029.g004

Fig 5 .
Fig 5. Relationship between Achilles gene dependence scores and the responses of cell lines to pathway inhibitors for Tdark genes.Comparison of the dose-response profiles to pathway inhibitors (a: Cediranib, b: Sepantronium bromide, c: KRAS(G12C) Inhibitor-12, d: C-75, e: Vinorelbine, f: UNC0638, g: rTRAIL, h: Uprosertib, i: Paclitaxel) between the cancer cell lines with lower dependence (boxplots coloured blue) on signalling pathway and those with higher dependence (boxplots coloured red) on signalling pathway.Boxplots show logarithm-transformed mean IC50 values of the cancer cell lines of each group.On each box, the central mark indicates the median, and the bottom edge represents the 25 th percentile, whereas the top edge of the box represents the 75 th percentile.The whiskers extend to the most extreme data points not considered outliers, and the outliers are plotted individually using the '+' symbol.The scatter points within each box plot show the overall distribution of the data points.https://doi.org/10.1371/journal.pone.0296029.g005

Fig 6 .
Fig 6.Relationship between mutations of different pathway genes and the responses of the cell lines to pathway inhibitors for Tdark genes.Comparison of the dose-response profiles to pathway inhibitors (a: Vinorelbine, b: Daporinad, c: Wee 1 Inhibitor, d: Afatinib, e: Sepantronium bromide, f: Uprosertib, g: Vinblastine, h: PLX-4720, i: Trametinib) between the cancer cell lines without mutation (boxplots blue) on signalling pathway and those with mutation (boxplots coloured red) on signalling pathway.Boxplots show logarithm-transformed mean IC50 values of the cancer cell lines of each group.On each box, the central mark indicates the median, and the bottom edge represents the 25 th percentiles, whereas the top edge of the box represents the 75 th percentile.The whiskers extend to the most extreme data points not considered outliers, and the outliers are plotted individually using the '+' symbol.The scatter points within each box plot show the overall distribution of the data points.https://doi.org/10.1371/journal.pone.0296029.g006

Fig 7 .
Fig 7. PI3K/mTOR pathway.The relationship between gene dependencies (a) or mutations (b) and drug responses across cancer cell lines in the PI3K/mTOR pathway.From top to bottom, panels indicate: Dependence; the overall CRISPR-derived gene dependence scores of the gene along that column (a).Overall mutation frequencies observed for the gene along the columns (b).Clustered heatmap; The marks on the heatmap are coloured based on how a high dependence on, or mutations in, the gene along each column affect the efficacy of the drug given along each row: (1) with green denoting significantly (10% false discovery rate) increased sensitivity, (2) grey for no statistically significant difference between cell line with a higher and lower dependence on the gene, or cell line with mutation in a gene and (3) orange denoting significantly increased resistance (for gene dependence and for gene mutations).The gene names (column labels) are coloured based on the overall calculated effect that high dependence on the gene has on the efficacy of the drug given along rows.Green: all the cell lines are significantly more sensitive to all the pathway inhibitors, orange; all the cell lines are significantly more resistant to all the pathway inhibitors, and black; a mixed response to pathway inhibitors.The bar graphs represent the total number of drugs whose dose-response is significantly increased (green) or decreased (orange).
Fig and S4 File).Notably, a significant proportion of genes (694 [17%]) was associated with reduced OS in three different cancer types (see S10A Fig).Additionally, our analysis revealed that the cancer types with the highest number of significant genes were LGG (Brain lower grade glioma) and KIRC (Kidney renal clear cell carcinoma), 1,251 and 1,135 genes, respectively (see S10B Fig and S4 File).

Fig 8 .
Fig 8. Kaplan-Meier survival curves depicting the impact of dark genes on the survival of patients with cancer.Overall survival periods (a) and diseasefree survival periods (b) of TCGA patients with tumours that had Tdark mutations and tumours without Tdark mutations.Kaplan-Meier curve of the overall survival periods (c) and disease-free survival periods (d) of TCGA patients with tumours that expressed high and low ARL6IP6 transcript levels.https://doi.org/10.1371/journal.pone.0296029.g008

S1File.
Supplementary data of light and dark genes and cancer studies.The spreadsheet contains the following results/datasets according to the sheet name.Pharos Data: The distribution of genes within each development level obtained from Pharos.Cancer studies: List and description of individual cancer studies from which our analyses are based.Cancer Mutations in each TDL: The number of cancers with mutations in each target development level (TDL) class related to Fig 1b.Specific Gene Mutations: Frequency of dark and light gene mutations across all cancer types for each gene.Percent Dark Gene Mutations: Percentage of dark gene mutations in each cancer type for each gene.Percent Light Gene Mutations: Percentage of light gene mutations in each cancer type for each gene.Common Essential Genes: Publication, antibody, and monoclonal antibody counts for common essential genes (both dark and light genes).Non-Essential Genes: Publication, antibody, and monoclonal antibody counts for nonessential genes (both dark and light genes).(XLSX) S2 File.Correlation scores between mRNA expression and gene dependency scores, related to Fig 4.
).Additionally, patients with tumours expressing low RAI2 levels had the shortest significant OS duration (OS = 68.48months) compared to those with high expression of this gene (OS = 174.84months; see S8A Fig).We observed that DFS periods were significantly shorter (log-rank test; p = 6.75 x 10 −19 ) for patients with tumours expressing low RAI2 levels than those with high RAI2 expression (see S8B Fig).